GANESH

GANESH A Sequence Analysis And Display Package

GANESH Demonstration - WAGR regin of chromosome 11.

GANESH is derived from the Active11Db project. The general objectives of Active11Db were to develop update and revision mechanisms for the active maintenance of data derived from multiple genomic data sources, and to employ and evaluate the results by application to the human chromosome 11 database 11Db, maintained in the Department of Biochemistry, Imperial College.

11Db is a collection of data and annotations on the WAGR region, a 14Mb region of human chromosome 11. It is representative of a wide class of biological databases that provide compilations of data derived from multiple sources. The general aim of the project was to develop methods that will enable such databases to update themselves via the Internet on a regular schedule. 11Db is used a source of specific problems to drive the development of general solutions.

Whilst developing Active11Db it became apparent that the methods used and the software being developed could be adapted for other sequence regions. From this premise GANESH was developed and has since been adapted for several regions and been successfully installed by other research groups.

GANESH Structure

From the outset it was apparent that there were two separate problems to resolve, the analysis and automatic update of the contig sequences and a visual front end to view the analysis data. The update and revision software is written in perl mainly because of its ease of text and file handling but also because this is the programming language of choice for bioinformatics. The database chosen to store the sequence data was mySQL, due mainly to it being public domain but also due to its ease of installation. The latter being particularly important if GANESH is to be freely distributable as it simplifies the installation and implementation.

It was decided to use Java to develop the visual front end due to its suitability for the web as well as it's platform independence as an application - both versions of the system have been produced. The other benefit was it allowed the use of proprietary genome software, in particular the Neomorphic Genome Software Development Kit (NGSDK). NGSDK provides programming tools that enable the easier development of genome annotation software such as the GANESH viewer.

The two parts of GANESH are independent and it is possible to use them as such. In particular the GANESH viewer can be used to view any sequence and analysis data as long as it is the correct format in the database. However, it is probably most practical to use the two GANESH parts together as one combined system.

Initial Analysis and Automatic Update

Java Viewer

The GANESH system has so far successfully been adapted for several projects including regions of human chromosomes 3, 11, 14, 15, 20 and the complete 21 and mouse chromosomes 2, 4 and 12. It has further been installed at a second site in the School of Medicine at Imperial College, with other installations planned or in progress, and supports users in several countries annotating regions of numerous chromosomes.

The objective of the project has been to produce software that enables the automatic analysis of any region of genome and the ability to view that analysis data in context. The software produced so far has addressed that problem and produced a stable update and analysis package that is fully adaptable to include other genomic analysis tools. The viewing package allows the visualization of the analysis data at increasing levels of detail and is also fully configurable for any new genomic tools added to the analysis and update software.

Further Developments

There are several improvements that can be made to GANESH and these include:

Re-implementation of the Neomorphic software due to its limitations:
a. GANESH is intended to be open source, which is complicated by the fact that the Neomorphic software is licensed.
b. Certain aspects of the applet version of GANESH suffer severe speed limitations previously mentioned. In particular there is an extreme delay at the point where the glyphs are added to the display. This is only a problem for the applet and can be overcome by using the latest version of browsers. As the Neomorphic software is licensed and the source code is not accessible it is not possible to isolate the speed problem so it is hoped that rewriting the Neomorphic components may overcome the problem.
The unfinished sequence annotation display could be improved by attempting to order the sub sequences. This could either be achieved by adapting some or all of the GigAssembler (Kent W.J . et al) system. This is a system developed to attempt to assemble the entire human draft genome sequence. The scale of the project far exceeds anything required by GANESH but it should be possible to adapt the underlying algorithms to attempt to complete the unfinished sequences. Alternatively it may be a better proposition, considering the smaller scale of the problem, developing proprietary algorithms to finish the sequences. Any finishing method used could be further developed to finish the entire displayed region into a proposed consensus sequence.
Ensembl is becoming the benchmark for the annotation of the human and mouse genomes and other species may probably follow. A possible development for GANESH is to link it to Ensembl in order to compare annotations or alternatively combine the Ensembl annotations with GANESH.
Making Ganesh compliant with the Distributed Sequence Annotation System (DAS). DAS is a system that allows laboratories to develop their own annotations that can be viewed, graphically, by the community at large, as long as they are DAS compliant. A possible development of GANESH is to ensure that the annotations are in the format required by DAS, and to incorporate GANESH into the DAS system, to add another level to the viewing of the data.

GANESH has been developed with a grant awarded through the BBSRC/EPSRC Bioinformatics Initiative and a demo is available here. The contributors to the development of the system are:

Derek Huntley - Department of Computing, Imperial College, London, UK

Sasivimol Kittivoravitkul - Department of Computing, Imperial College, London, UK

Dr Holger Hummerich - MRC Prion Unit / Department of Neurogenetics, Institute of Neurogenetics, London, UK

Prof Marek Sergot - Department of Computing, Imperial College, London, UK

Prof Peter Little - School of Biochemistry & Molecular Genetics, University of New South Wales, Sydney 2052, Australia

Dr Damian Smedley - School of Medicine, Imperial College, London, UK

Prof. Mark McCarthy - School of Medicine, Imperial College, London, UK

Manuel Cardoso - Department of Computing, Imperial College, London, UK

Chris Ioannou - Department of Computing, Imperial College, London, UK

Comments: